- 
                Notifications
    You must be signed in to change notification settings 
- Fork 14
Preparatory splitstream format changes for ostree support #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3803dc3    to
    9aedd96      
    Compare
  
    |  | ||
| use crate::{sha256_from_descriptor, sha256_from_digest, tar::split_async, ContentAndVerity}; | ||
|  | ||
| pub const TAR_LAYER_CONTENT_TYPE: u64 = 0x2a037edfcae1ffea; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment for where these came from? I'm guessing random? If so just add a comment // Random unique ID ?
That said I wonder if it wouldn't be nicer to store (variable length) strings for this in the format? Maybe it could go all the way to literally suggested to be the mediaType from OCI (if applicable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. These are random.
But, I'd rather avoid having variable length things in the header. That makes parsing it much more tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make it a real uuid tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine keeping as u64 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments here
        
          
                crates/composefs/src/repository.rs
              
                Outdated
          
        
      | pub fn has_named_stream(&self, name: &str) -> bool { | ||
| let stream_path = format!("streams/refs/{}", name); | ||
|  | ||
| readlinkat(&self.repository, &stream_path, []).is_ok() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like "swallowing" errors like this, I'd say call stat instead and require it's S_IFLNK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I redid this with stat()
| #[derive(Clone, Debug, FromBytes, Immutable, IntoBytes, KnownLayout)] | ||
| #[repr(C)] | ||
| pub struct MappingEntry { | ||
| pub body: Sha256Digest, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're changing the format here...I think it'd be nice to make this one extensible.
However...bigger picture there's another consideration: There's obviously a metric ton of binary serialization formats out there. A custom one isn't wrong necessarily but...how about say CBOR ? It has some usage and a proper RFC etc.
I guess a dividing line is "do we care about mmap()"? Probably not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
splitstreams are essentially thin wrappers of existing binary formats (tars, ostree objects, etc), adding just references to other composefs repo objects. I'm not sure its overly helpful to use a complicated binary format for the wrapping, especially one which is completely different from the inner format.
That said, I agree that we should make it at least a bit extensible. This MR adds a magic header, but also adding a version field and a few bytes of unused/unparsed space does seem quite useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed mmap, and the end result was, no, we don't want it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
splitstreams are essentially thin wrappers of existing binary formats (tars, ostree objects, etc), adding just references to other composefs repo objects. I'm not sure its overly helpful to use a complicated binary format for the wrapping, especially one which is completely different from the inner format.
Yeah, though the nice thing about Rust here is that for stuff like this there's a lot of well-done crates.
It also makes it a lot more obvious and easy to parse from other languages too if we can say "it's just CBOR" (or whatever).
Anyways: I'm basically fine with this as is too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although what about the algorithm agility? There's been some thoughts that for post-quantum crypto we may need to get away from sha256 in theory as far as I understand things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right thing is to add a header size field, and skip parts we don't understand. Then we can easily extend this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new extension_size field which we skip on read.
9aedd96    to
    057121b      
    Compare
  
    This changes the splitstream format a bit, with the goal of allowing splitstreams to support ostree files as well (see containers#144) The primary differences are: * The header is not compressed * All referenced fs-verity objects are stored in the header, including external chunks, mapped splitstreams and (a new feature) references that are not used in chunks. * The mapping table is separate from the reference table (and generally smaller), and indexes into it. * There is a magic value to detect the file format. * There is a magic content type to detect the type wrapped in the stream. * We store a tag for what ObjectID format is used * The total size of the stream is stored in the header. The ability to reference file objects in the repo even if they are not part of the splitstream "content" will be useful for the ostree support to reference file content objects. This change also allows more efficient GC enumeration, because we don't have to parse the entire splitstream to find the referenced objects. Signed-off-by: Alexander Larsson <alexl@redhat.com>
057121b    to
    bed66dc      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's talk about this. I might have a bit of bandwidth to work on this if you like.
| struct SplitstreamHeader { | ||
| magic: [u8; 7], // Contains SPLITSTREAM_MAGIC | ||
| algorithm: u8, // The fs-verity algorithm used, 1 == sha256, 2 == sha512 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also do fs-verity block size here. That's usually expressed as a bit-shift count, so 12 or 16...
We could also write it like "fsverity-sha256-12" or so as a string... some relevant discussion in #181.
| n_refs: u64, | ||
| n_mappings: u64, | ||
| refs: [ObjectID; n_refs] // sorted | ||
| mappings: [MappingEntry; n_mappings] // sorted by body | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so sketch that we hardcode sha256 here... I think that's probably OK, but maybe we'd add an extension mechanism so we could add new types of mappings tables...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this from the start:
- magic
- n_sections
- n_sections * (
- section_start
- section_size_in_bytes
 
- )
We could name the sections but I think it's quite OK to just know what the numbers mean and require that they're all present, in order. An empty section would be denoted by a zero size.
We could also get into compat vs. uncompat extensions... not sure how far it's worth going here...
| We should get this in probably very soon and then I think declare the format stable? | 
| 
 I have a branch..... lol | 
| One of the things that I'm tormenting myself on a bit right now is the sha256 mapping.  I'm considering changing it to a general-purpose "named object reference" mapping: we could then have a hashmap mapping names like "sha256:12345" to the object ID in question, and adding sha512 would be seamless.  I think I'd chose to encode that as a nul-separated series of strings of the form  The alternative is to stay with what we have now more or less, but it's much less flexible and is gonna be cruft one day, I'm sure. That being said: we have an extensibility mechanism now, at least... The other thing that really needs fixing in @alexlarsson's work vs. the current version of the branch is that we should really take advantage of the fact that we have the references array out front now and use indexes into it from the splitstream content instead of repeating the whole object ID. It would have the additional advantage of ensuring that it became physically impossible to refer to an object that wasn't listed in the "depends" header (which we'll use during GC) and yet another advantage in that we could use the 64bit "starting word" for each internal/external section in the stream for both cases: 
 (or with the high bit as a flag or whatever). It's just "work" to get this over the line.... After that, I think we need to figure out a way to kill off the content-sha256-based naming of splitstreams and perhaps even consider getting rid of the streams/ directory entirely... each backend would have its own way of 'caching' interesting splitstream objects for itself. I'm not entirely sure how I'd do that for the OCI backend. In a related conversation with @Johan-Liebert1 we discussed having the layers (and possibly the config) ((and possibly possibly the manifest some day)) referenced from the erofs image itself as a way to optionally prevent those things from being GC'd. We'd probably want some sort of a better "lookup table" still, though... but the key difference is that this table would be unique to OCI, not some global "streams" directory that we try to pretend is sharable by everyone on equal footing... | 
This changes the splitstream format a bit, with the goal of allowing splitstreams to support ostree files as well (see #144), but it it imho a generally nice change.
The primary differences are:
The ability to reference file objects in the repo even if they are not part of the splitstream "content" will be useful for the ostree support to reference file content objects.
This change also allows more efficient GC enumeration, because we don't have to parse the entire splitstream to find the referenced objects.